\section{Digging in the sources}

This section is dedicated to people who want to dig into Metalua
sources. It presents the main files, their current state, and where to
start your exploration.

\subsection{gg}

The real core of Metalua is gg, implemented in a single file {\bf
  gg.ml}. Understanding it is really the key to getting the big
picture. gg is written in a rather functional style, with a lot of
functors and closures.

Gg is a restricted version of Haskell's parser combinator library
parsec: parsec allows to build complex parsers by combining simpler
ones with concatenation, alternative choices, etc; it allows, among
others, to handle backtracking, i.e. a given parser can have several
interpretations of a single sentence, and parser combinators handle
this non-determinism by choosing the interpretation which allows the
combined parser to yield a result.

Gg intentionally doesn't support backtracking: not only would it be
slightly harder to read in a non-lazy language such as Lua, but it
isn't required to parse Lua. More importantly, built-in backtracking
would encourage people to create ambiguous syntax extensions, which is
an awfully bad idea: indeed, we expect different extensions to
cohabitate as smoothly as possible, and two extensions with ambiguous
grammars generally cause a lot of chaos when mixed together. Finally,
a lot of Lua's essence is about friendly, unsurprizing, clear
syntax. We want to encourage people to respect this spirit as
much as possible, so if they want to introduce chaotic syntax, Metalua
won't actively prevent them to do so, but it certainly won't help by
providing the tools.

Gg offers no atomic parsers besides keyword parser generators; it's up
to the programmer to provide them. Parsers are simply functions which
take a lexer as a parameter, and return an AST. Such function examples
are provided for mlp in mlp\_expr.lua. Lexers are objects with a
couple of mandatory methods: peek, next and is\_keyword. Lexer API
shall be discussed in the part about mll.

\paragraph{State} 
Gg.lua is correctly refactored and commented, and should be readable
by anyone with some notions of Lua and functional programming. Having
dealt with parsec might help a bit, but is definitely not required.

\paragraph{Going further} 
From gg, there are two main leads to follow: either look down to mll,
Metalua's lexer, or look up to mlp, the Metalua parser implemented on
top of gg.

\subsection{lexer, mlp\_lexer}

As stated above, gg relies on a lexer respecting a certain API. We
provide such a lexer for Metalua parsing, which deals with the usual
lexing tasks: getting rid of spaces and comments, and discriminating
between keywords, identifiers, strings, numbers etc. It's implemented
in the file {\bf lexer.lua}

This lexer can be parameterized with a list of keywords; {\tt
  mlp\_lexer.lua} defines the mlp lexer, i.e. uses the generic lexer to
create a derived lexer, and adds Lua keywords to it. 

\paragraph{State}
lexer.lua is somewhat extensible by someone willing to inspect its
sources carfully. Among others, playing with the list lexer.extractors
will let you create new lexical entities readers. However, the global
architecture of the lexer still deserves a serious refactoring.

\subsection{mlp}
Mlp is a very important part of Metalua, the one most people will
actually have to deal with. Understanding it requires to understand
gg. Mlp is cut into several parts:

\begin{itemize}
\item {\bf mlp\_expr.lua} parses expressions except literal tables
  and quotes. It includes other constants (booleans, strings,
  numbers), the compound expressions built by combining them with
  operators, and function bodies. Most of its complexity is handled by
  the expression parser generator gg.expr.
\item {\bf mlp\_table.lua} parses tables. Not much to say about this,
  this is probably the simplest subpart of mlp.
\item {\bf mlp\_stat.lua} parses statements. Except for assignments,
  every different statement is introduced by a distinct initial
  keyword, and it should remain that way.
\item {\bf mlp\_ext.lua} gathers the parts of the metalua syntax that
  aren't regular Lua: customizable assignments, short lambda syntax
  etc. 
\item {\bf mlp\_meta.lua} handles the meta-operations splicing and
  quoting.
\item {\bf mlp\_misc.lua} contains various little bits that wouldn't
  fit anywhere else. In other words, it's sort of a mess.
\end{itemize}

\subsection{Bytecode generation}
Bytecode generation by metalua is a quick and dirty hack, which
sort-of does the trick for now, but should eventually be largely
rewritten. The current scaffolding has been hacked from Kein-Hong
Man's Yueliang project (\url{http://luaforge.net/projects/yueliang}),
but Yueliang design rationales don't really fit metalua's
needs. Indeed, Yueliang is a translation, almost statement by
statement, of the official C compiler included in Lua
distribution. This has the following consequences:

\begin{itemize}
\item it helps to understand the original compiler in C;
\item it's easy to backport extensions from the C version to Yueliang
  (I had to do it, since Yueliang was 5.0 and I needed 5.1)
\item it's rather easy to get bitwise-identical compilations, between
  what Yueliang produces and what the C version does. And that's good,
  because testing a compiler is a non-trivial problem.
\item being in Lua, it's much easier to debug and extend than C.
\end{itemize}

The big drawback is that the code is very much structured like C, and
is therefore big, memory and time hungry, harder than necessary to
understand and maintain, not suitable for clean and easy
extensions. Two possible evolutions could be considered for metalua:

\begin{itemize}
\item either port the bytecode producer to C, to speed up things;
\item or rewrite it in ``true'' (meta)Lua\footnote{Pure Lua would
    probably be a much better idea: it would keep bootstrapping issues
    trivial.} , i.e. by leveraging all the power of tables, closures
  etc. This would allow the bytecode producer to be extensible, and
  interesting things could be done with that. Whether it's a good
  idea, I don't know. The drawback is, it would be much harder to keep
  compatibility with the next Lua VM versions.
\end{itemize}

So, the files borrowed from Yueliang are {\bf lopcode.lua}, {\bf
  lcode.lua}, and {\bf ldump.lua}. They've been quickly hacked to
produce 5.1 bytecode rather than 5.0. Finally, {\bf compile.lua},
which uses the previous modules to convert AST to bytecode, is a
grossly abused version of Yueliang's lparse.lua.

Notice a big current limitation: it only dumps little endian, double
floats, 32 bits integers based bytecode. 64 bit and/or big endian
platforms are currently not supported (although it shouldn't be hard
to extend ldump.lua to handle those).
 
\subsection{The bootstrapping process}

FIXME
